A Simple Introduction to Maximum Entropy Models for Natural Language Processing
نویسنده
چکیده
Many problems in natural language processing can be viewed as lin guistic classi cation problems in which linguistic contexts are used to pre dict linguistic classes Maximum entropy models o er a clean way to com bine diverse pieces of contextual evidence in order to estimate the proba bility of a certain linguistic class occurring with a certain linguistic con text This report demonstrates the use of a particular maximum entropy model on an example problem and then proves some relevant mathemat ical facts about the model in a simple and accessible manner This report also describes an existing procedure called Generalized Iterative Scaling which estimates the parameters of this particular model The goal of this report is to provide enough detail to re implement the maximum entropy models described in Ratnaparkhi Reynar and Ratnaparkhi Ratnaparkhi and also to provide a simple explanation of the max imum entropy formalism Introduction Many problems in natural language processing NLP can be re formulated as statistical classi cation problems in which the task is to estimate the probability of class a occurring with context b or p a b Contexts in NLP tasks usually include words and the exact context depends on the nature of the task for some tasks the context b may consist of just a single word while for others b may consist of several words and their associated syntactic labels Large text corpora usually contain some information about the cooccurrence of a s and b s but never enough to completely specify p a b for all possible a b pairs since the words in b are typically sparse The problem is then to nd a method for using the sparse evidence about the a s and b s to reliably estimate a probability model p a b Consider the Principle of Maximum Entropy Jaynes Good which states that the correct distribution p a b is that which maximizes en tropy or uncertainty subject to the constraints which represent evidence i e the facts known to the experimenter Jaynes discusses its advan tages in making inferences on the basis of partial information we must use that probability distribution which has maximum entropy sub ject to whatever is known This is the only unbiased assignment we can make to use any other would amount to arbitrary assumption of information which by hypothesis we do not have More explicitly if A denotes the set of possible classes and B denotes the set of possible contexts p should maximize the entropy H p X
منابع مشابه
Fast parameter estimation for joint maximum entropy language models
This paper discusses efficient parameter estimation methods for joint (unconditional) maximum entropy language models such as whole-sentence models. Such models are a sound framework for formalizing arbitrary linguistic knowledge in a consistent manner. It has been shown that general-purpose gradient-based optimization methods are among the most efficient algorithms for estimating parameters of...
متن کاملMaximum Entropy Models and Prepositional Phrase Ambiguity
Prepositional phrases are a common source of ambiguity in natural language and many approaches have been devised to resolve this ambiguity automatically. In particular, several different machine learning approaches have now reached accuracy rates of around 84.5% on the benchmark dataset. Maximum entropy (maxent) models, despite their successful application in many other areas of natural languag...
متن کاملSentence Boundary Detection for Handwritten Text Recognition
In the larger context of handwritten text recognition systems many natural language processing techniques can potentially be applied to the output of such systems. However, these techniques often assume that the input is segmented into meaningful units, such as sentences. This paper investigates the use of hidden-event language models and a maximum entropy based method for sentence boundary det...
متن کاملA Maximum Entropy Approach to Natural Language Processing
The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper, we describe a method for statistical modeling based on maximum entropy. We present a maximum-likelihoo...
متن کامل